Skip to content

Conversation

@wfchandler
Copy link
Contributor

System messages on illumos are logged to /var/adm/messages. In the event of a hardware or kernel issue, an archive of these logs can be for understanding the nature of the failure.

Currently we archive all Oxide service logs from the global zone and any zones we create, but not /var/adm/messages. Update the archiver to include the latter as well.

System messages on illumos are logged to `/var/adm/messages`. In the
event of a hardware or kernel issue, an archive of these logs can be
for understanding the nature of the failure.

Currently we archive all Oxide service logs from the global zone and any
zones we create, but not `/var/adm/messages`. Update the archiver to
include the latter as well.
@wfchandler
Copy link
Contributor Author

Validated on berlin that rotated copies of /var/adm/messages are being archived:

BRM42220082 # ls -l /pool/ext/3516350f-4b93-4c70-a0ad-5fe1f2547492/crypt/debug/global
total 18
-rw-r--r--   1 root     root          12 Dec  1 19:27 messages.1764616962
-rw-r--r--   1 root     root          12 Dec  1 19:32 oxide-sled-agent:default.log.1764617303

@wfchandler wfchandler requested a review from jgallagher December 1, 2025 19:44
for zone in oxz_zones {
let logdir = if zone.global() {
PathBuf::from("/var/svc/log")
zone.path.join("var/svc/log")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm fine with this, but just to understand) this is a conditional within the "global zone" case anyway - so behaviorally this is the same as PathBuf::from ("/var/svc/log"), right? Or are there cases where zone.path is something other than / for global zones?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this question. If anything I'd worry this would break the tests, because they seem to construct their own Zone with a path in a tempdir. (Are they archiving the current system's /var/adm/messages, with this change?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or are there cases where zone.path is something other than / for global zones?

AFAIK no, except in tests. Prior to this PR none of the tests in this module created a fake global zone.

If anything I'd worry this would break the tests, because they seem to construct their own Zone with a path in a tempdir.

The opposite, without this switching to a relative path at

let adm_logdir = zone.path.join("var/adm");
the updated test would fail because it was trying to archive the actual system logs on my illumos box, rather than the test logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misread the diff 🤦 That makes sense.

&self,
debug_dir: &DebugDataset,
logdir: PathBuf,
log_name_pattern: &str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is a non-pub function, it's probably worth adding docs for this (e.g., it's a prefix; it interacts with the include_live arguments, it's searching within the logdir path, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added in 6e5ba44

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I am hoping to do some significant rework of a lot of this code soon. Some of it mechanical, like renaming the module and creating submodules. But I'm also hoping to make some more substantial changes to rework file collection so that it's easier to add and test support for collecting new files. I can plan to base my work on this branch, if you expect this to land soon, or I can incorporate this feature into the work I plan to do (but that may not finish in time for R18). Either works for me.

if include_live {
let pattern = logdir
.join("*.log*")
.join(format!("{log_name_pattern}*"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like kind of a dicey interface to me, but I feel like that's true for a lot of the existing pieces here. I'm hoping to rework the way a bunch of the log archival works to make it more testable anyway so this is probably fine for now.

debug_dir, logdir, "*.log", zone_name, false,
)
.await?;
if zone.global() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file useful in non-global zones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never had a reason to check /var/adm/messages other than in the global zone so far, on the other hand, there's no harm in capturing them from all zones as they're quite small and they might be useful in the future.

Updated with ddc56e2.

.await?;
if zone.global() {
let adm_logdir = zone.path.join("var/adm");
self.archive_logs_from_zone_path(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(probably not suggesting any changes here)

I expect that this will drop the "messages" file alongside all the SMF logs in the debug dataset. In an ideal world, I'd rather we organized these files more clearly (consumers of this information have to know that "messages" is not an SMF service but a different thing). But this is already a problem today. The only real new impact of this PR is that oxlog may wind up reporting these files under the messages service or something strange like that? Fixing this seems beyond the scope of this PR. I'll file an issue for the general problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, today oxlog will filter out the archived messages files

const OX_SMF_PREFIXES: [&str; 2] = ["oxide-", "system-illumos-"];
/// Return true if the provided file name appears to be a valid log file for an
/// Oxide-managed SMF service.
///
/// Note that this operates on the _file name_. Any leading path components will
/// cause this check to return `false`.
pub fn is_oxide_smf_log_file(filename: impl AsRef<str>) -> bool {
// Log files are named by the SMF services, with the `/` in the FMRI
// translated to a `-`.
let filename = filename.as_ref();
OX_SMF_PREFIXES
.iter()
.any(|prefix| filename.starts_with(prefix) && filename.contains(".log"))
}
because the file name does not start with one of our expected prefixes. Thanks for opening the issue!

BRM42220082 # ls -l /pool/ext/3516350f-4b93-4c70-a0ad-5fe1f2547492/crypt/debug/global
total 18
-rw-r--r--   1 root     root          12 Dec  1 19:27 messages.1764616962
-rw-r--r--   1 root     root          12 Dec  1 19:32 oxide-sled-agent:default.log.1764617303

for zone in oxz_zones {
let logdir = if zone.global() {
PathBuf::from("/var/svc/log")
zone.path.join("var/svc/log")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this question. If anything I'd worry this would break the tests, because they seem to construct their own Zone with a path in a tempdir. (Are they archiving the current system's /var/adm/messages, with this change?)


let tempdir_path = tempdir.path().as_str().to_string();
let global_zone = Zone::from_str(&format!(
"0:global:running:{tempdir_path}::ipkg:shared"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate temporary directory? If I'm reading this right, this test is running the archiver to collect data from this temporary directory. It used to only pretend to have one zone, and now it has two. But wouldn't the first archival delete the first set of log files so that the second one has nothing to collect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call.

The tests works despite sharing the same directory because the the global zone archiver was looking in var/svc/log and var/adm, vs root/var/svc/log for other zone.

Updated with b386fe2.

)
.await?;

let adm_logdir = zone.path.join("var/adm");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not quite right for non-global zones.

If I'm understanding right:

  • for the global zone, zone.path() is /. This is the zone's path and also the zone's root filesystem path.
  • for the non-global zone, zone.path() is something else. This is the zone's path, but it's not the root filesystem. That's under root.

I'd suggest adding up at L1102 or so:

let zone_root = if zone.global() { zone.path().to_owned() } else { zone.path().join("root"); };

Then you can just have:

let logdir = zone_root.join("var/svc/log");

for both cases. And then here, you can use zone_root.join("/var/adm").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! Fixed with 1a457af

@davepacheco
Copy link
Collaborator

Looks good aside from the comment wrapping!

@wfchandler wfchandler removed the request for review from jgallagher December 5, 2025 16:47
Comment on lines +1157 to +1161
// Archives log files found in `logdir` for zone `zone_name` to the destination debug dataset.
//
// `log_name_pattern` should be a glob pattern that matches against file names, e.g., `*.log`, `mylog`.
// If `include_live` is `true`, this will archive all logs, matching on `{log_name_pattern}*`.
// If it is `false`, only rotated logs will be archived, matching on `{log_name_pattern}.[0-9]`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice this before. These lines are over 80 columns. Can you rewrap them? (rustfmt cannot enforce long lines in comments without unstable options, unfortunately.)

@davepacheco
Copy link
Collaborator

(Sorry, I had queued up the comment about the wrapped lines yesterday, then submitted a comment on the PR itself thinking it had submitted that comment, but it hadn't. I just now submitted the note I was referring to.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants